DNA short read alignment on apache spark
نویسندگان
چکیده
منابع مشابه
Scalability Potential of BWA DNA Mapping Algorithm on Apache Spark
This paper analyzes the scalability potential of embarrassingly parallel genomics applications using the Apache Spark big data framework and compares their performance with native implementations as well as with Apache Hadoop scalability. The paper uses the BWA DNA mapping algorithm as an example due to its good scalability characteristics and due to the large data files it uses as input. Resul...
متن کاملCGAP-Align: A High Performance DNA Short Read Alignment Tool
BACKGROUND Next generation sequencing platforms have greatly reduced sequencing costs, leading to the production of unprecedented amounts of sequence data. BWA is one of the most popular alignment tools due to its relatively high accuracy. However, mapping reads using BWA is still the most time consuming step in sequence analysis. Increasing mapping efficiency would allow the community to bette...
متن کاملImproving PacBio Long Read Accuracy by Short Read Alignment
The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long rea...
متن کاملApproximate Stream Analytics in Apache Flink and Apache Spark Streaming
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Computing and Informatics
سال: 2020
ISSN: 2634-1964,2210-8327
DOI: 10.1016/j.aci.2019.04.002